Generation of Compound Words in Statistical Machine Translation into Compounding Languages
نویسندگان
چکیده
منابع مشابه
Generation of Compound Words in Statistical Machine Translation into Compounding Languages
In this article we investigate statistical machine translation (SMT) into Germanic languages, with a focus on compound processing. Our main goal is to enable the generation of novel compounds that have not been seen in the training data. We adopt a split-merge strategy, where compounds are split before training the SMT system, and merged after the translation step. This approach reduces sparsit...
متن کاملProductive Generation of Compound Words in Statistical Machine Translation
In many languages the use of compound words is very productive. A common practice to reduce sparsity consists in splitting compounds in the training data. When this is done, the system incurs the risk of translating components in non-consecutive positions, or in the wrong order. Furthermore, a post-processing step of compound merging is required to reconstruct compound words in the output. We p...
متن کاملStatistical Machine Translation of German Compound Words
German compound words pose special problems to statistical machine translation systems: the occurence of each of the components in the training data is not sufficient for successful translation. Even if the compound itself has been seen during training, the system may not be capable of translating it properly into two or more words. If German is the target language, the system might generate on...
متن کاملDefinite noun phrases in statistical machine translation into Scandinavian languages
The Scandinavian languages have an unusual structure of definite noun phrases (NPs), with a noun suffix as one possibility of expressing definiteness, which is problematic for statistical machine translation from languages with different NP structures. We show that translation can be improved by simple source side transformations of definite NPs, for translation from English and Italian, into D...
متن کاملTranslation of unknown words in phrase-based statistical machine translation for languages of rich morphology
This paper proposes a method for handling out-of-vocabulary (OOV) words that cannot be translated using conventional phrase-based statistical machine translation (SMT) systems. For a given OOV word, lexical approximation techniques are utilized to identify spelling and inflectional word variants that occur in the training data. All OOV words in the source sentence are replaced with appropriate ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Computational Linguistics
سال: 2013
ISSN: 0891-2017,1530-9312
DOI: 10.1162/coli_a_00162